✨ feat: add pod status checking to distinguish pending vs running jobs#51
✨ feat: add pod status checking to distinguish pending vs running jobs#51
Conversation
…n_progress misclassification
…er and status of job is returned as unknown
- Extracted Celery and direct submission logic into helper functions. - Fixed a blocking bug by offloading Celery's to a thread pool executor. - Normalized generation to ensure consistent returns on failure. - Improved readability and maintainability by simplifying the status mapping logic.
Avoid blocking the event loop while waiting for Celery response timeout.
|
Deploying this to an openshift cluster to test... |
There was a problem hiding this comment.
@fMurugi use black and ruff to format changes introduced
Example for the gfmstudio/fine_tuning/core/kubernetes.py file:
black gfmstudio/fine_tuning/core/kubernetes.py && ruff check --select I --fix gfmstudio/fine_tuning/core/kubernetes.py
…tial-studio-core into feat/check-pod-phase
| Returns | ||
| ------- | ||
| str | ||
| The pod phase: 'Running', 'Pending', 'Succeeded', 'Failed', 'Unknown', or None if no pod found |
There was a problem hiding this comment.
This is a great introduction. Thinking of easily maintaining this in the future, creating a reference schema class for these statuses would be ideal ensuring that you would only have one place to change or update the values incase of a future change/update.
Something like ...
class PodStatusPhase(str,Enum)
RUNNING = "Running"
PENDING = "Pending"
SUCCEEDED = "Succeeded"
FAILED = "Failed"
UNKNOWN = "Unknown"
NONE = "None" # adjust to what None is here
Then when returning the outputs from k8s,
if result:
try:
return PodStatusPhase(status_str)
except ValueError:
# Update this to handle what states would error
return PodStatusPhase.UNKNOWN
return None
And whenever in your code you are using the values you can easily use the values i.e
PodStatusPhase.FAILED
There was a problem hiding this comment.
Thanks for this Wanjiru ,this will move into a new task to be in another sprint.
… parse the env variables
Summary
Related Issue (optional)
How to test this PR?
Screenshots / Logs (optional)
Checklist